28 research outputs found

    Duration learning for analysis of nanopore ionic current blockades

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ionic current blockade signal processing, for use in nanopore detection, offers a promising new way to analyze single molecule properties, with potential implications for DNA sequencing. The alpha-Hemolysin transmembrane channel interacts with a translocating molecule in a nontrivial way, frequently evidenced by a complex ionic flow blockade pattern. Typically, recorded current blockade signals have several levels of blockade, with various durations, all obeying a fixed statistical profile for a given molecule. Hidden Markov Model (HMM) based duration learning experiments on artificial two-level Gaussian blockade signals helped us to identify proper modeling framework. We then apply our framework to the real multi-level DNA hairpin blockade signal.</p> <p>Results</p> <p>The identified upper level blockade state is observed with durations that are geometrically distributed (consistent with an a physical decay process for remaining in any given state). We show that mixture of convolution chains of geometrically distributed states is better for presenting multimodal long-tailed duration phenomena. Based on learned HMM profiles we are able to classify 9 base-pair DNA hairpins with accuracy up to 99.5% on signals from same-day experiments.</p> <p>Conclusion</p> <p>We have demonstrated several implementations for <it>de novo </it>estimation of duration distribution probability density function with HMM framework and applied our model topology to the real data. The proposed design could be handy in molecular analysis based on nanopore current blockade signal.</p

    A novel, fast, HMM-with-Duration implementation – for application with a new, pattern recognition informed, nanopore detector

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Hidden Markov Models (HMMs) provide an excellent means for structure identification and feature extraction on stochastic sequential data. An HMM-with-Duration (HMMwD) is an HMM that can also exactly model the hidden-label length (recurrence) distributions – while the regular HMM will impose a best-fit geometric distribution in its modeling/representation.</p> <p>Results</p> <p>A Novel, Fast, HMM-with-Duration (HMMwD) Implementation is presented, and experimental results are shown that demonstrate its performance on two-state synthetic data designed to model Nanopore Detector Data. The HMMwD experimental results are compared to (i) the ideal model and to (ii) the conventional HMM. Its accuracy is clearly an improvement over the standard HMM, and matches that of the ideal solution in many cases where the standard HMM does not. Computationally, the new HMMwD has all the speed advantages of the conventional (simpler) HMM implementation. In preliminary work shown here, HMM feature extraction is then used to establish the first pattern recognition-informed (PRI) sampling control of a Nanopore Detector Device (on a "live" data-stream).</p> <p>Conclusion</p> <p>The improved accuracy of the new HMMwD implementation, at the same order of computational cost as the standard HMM, is an important augmentation for applications in gene structure identification and channel current analysis, especially PRI sampling control, for example, where speed is essential. The PRI experiment was designed to inherit the high accuracy of the well characterized and distinctive blockades of the DNA hairpin molecules used as controls (or blockade "test-probes"). For this test set, the accuracy inherited is 99.9%.</p

    Gene expression during normal and FSHD myogenesis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Facioscapulohumeral muscular dystrophy (FSHD) is a dominant disease linked to contraction of an array of tandem 3.3-kb repeats (D4Z4) at 4q35. Within each repeat unit is a gene, <it>DUX4</it>, that can encode a protein containing two homeodomains. A <it>DUX4 </it>transcript derived from the last repeat unit in a contracted array is associated with pathogenesis but it is unclear how.</p> <p>Methods</p> <p>Using exon-based microarrays, the expression profiles of myogenic precursor cells were determined. Both undifferentiated myoblasts and myoblasts differentiated to myotubes derived from FSHD patients and controls were studied after immunocytochemical verification of the quality of the cultures. To further our understanding of FSHD and normal myogenesis, the expression profiles obtained were compared to those of 19 non-muscle cell types analyzed by identical methods.</p> <p>Results</p> <p>Many of the ~17,000 examined genes were differentially expressed (> 2-fold, <it>p </it>< 0.01) in control myoblasts or myotubes vs. non-muscle cells (2185 and 3006, respectively) or in FSHD vs. control myoblasts or myotubes (295 and 797, respectively). Surprisingly, despite the morphologically normal differentiation of FSHD myoblasts to myotubes, most of the disease-related dysregulation was seen as dampening of normal myogenesis-specific expression changes, including in genes for muscle structure, mitochondrial function, stress responses, and signal transduction. Other classes of genes, including those encoding extracellular matrix or pro-inflammatory proteins, were upregulated in FSHD myogenic cells independent of an inverse myogenesis association. Importantly, the disease-linked <it>DUX4 </it>RNA isoform was detected by RT-PCR in FSHD myoblast and myotube preparations only at extremely low levels. Unique insights into myogenesis-specific gene expression were also obtained. For example, all four Argonaute genes involved in RNA-silencing were significantly upregulated during normal (but not FSHD) myogenesis relative to non-muscle cell types.</p> <p>Conclusions</p> <p><it>DUX4</it>'s pathogenic effect in FSHD may occur transiently at or before the stage of myoblast formation to establish a cascade of gene dysregulation. This contrasts with the current emphasis on toxic effects of experimentally upregulated <it>DUX4 </it>expression at the myoblast or myotube stages. Our model could explain why <it>DUX4</it>'s inappropriate expression was barely detectable in myoblasts and myotubes but nonetheless linked to FSHD.</p

    A Metastate HMM with Application to Gene Structure Identification in Eukaryotes

    Get PDF
    We introduce a generalized-clique hidden Markov model (HMM) and apply it to gene finding in eukaryotes (C. elegans). We demonstrate a HMM structure identification platform that is novel and robustly-performing in a number of ways. The generalized clique HMM begins by enlarging the primitive hidden states associated with the individual base labels (as exon, intron, or junk) to substrings of primitive hidden states, or footprint states, having a minimal length greater than the footprint state length. The emissions are likewise expanded to higher order in the fundamental joint probability that is the basis of the generalized-clique, or "metastate", HMM. We then consider application to eukaryotic gene finding and show how such a metastate HMM improves the strength of coding/noncoding-transition contributions to gene-structure identification. We will describe situations where the coding/noncoding-transition modeling can effectively recapture the exon and intron heavy tail distribution modeling capability as well as manage the exon-start needle-in-the-haystack problem. In analysis of the C. elegans genome we show that the sensitivity and specificity (SN,SP) results for both the individual-state and full-exon predictions are greatly enhanced over the standard HMM when using the generalized-clique HMM.</p

    Myogenic Differential Methylation: Diverse Associations with Chromatin Structure

    No full text
    Employing a new algorithm for identifying differentially methylated regions (DMRs) from reduced representation bisulfite sequencing profiles, we identified 1972 hypermethylated and 3250 hypomethylated myogenic DMRs in a comparison of myoblasts (Mb) and myotubes (Mt) with 16 types of nonmuscle cell cultures. DMRs co-localized with a variety of chromatin structures, as deduced from ENCODE whole-genome profiles. Myogenic hypomethylation was highly associated with both weak and strong enhancer-type chromatin, while hypermethylation was infrequently associated with enhancer-type chromatin. Both myogenic hypermethylation and hypomethylation often overlapped weak transcription-type chromatin and Polycomb-repressed-type chromatin. For representative genes, we illustrate relationships between DNA methylation, the local chromatin state, DNaseI hypersensitivity, and gene expression. For example, MARVELD2 exhibited myogenic hypermethylation in transcription-type chromatin that overlapped a silenced promoter in Mb and Mt while TEAD4 had myogenic hypomethylation in intronic subregions displaying enhancer-type or transcription-type chromatin in these cells. For LSP1, alternative promoter usage and active promoter-type chromatin were linked to highly specific myogenic or lymphogenic hypomethylated DMRs. Lastly, despite its myogenesis-associated expression, TBX15 had multiple hypermethylated myogenic DMRs framing its promoter region. This could help explain why TBX15 was previously reported to be underexpressed and, unexpectedly, its promoter undermethylated in placentas exhibiting vascular intrauterine growth restriction

    Myogenic Differential Methylation: Diverse Associations with Chromatin Structure

    No full text
    Employing a new algorithm for identifying differentially methylated regions (DMRs) from reduced representation bisulfite sequencing profiles, we identified 1972 hypermethylated and 3250 hypomethylated myogenic DMRs in a comparison of myoblasts (Mb) and myotubes (Mt) with 16 types of nonmuscle cell cultures. DMRs co-localized with a variety of chromatin structures, as deduced from ENCODE whole-genome profiles. Myogenic hypomethylation was highly associated with both weak and strong enhancer-type chromatin, while hypermethylation was infrequently associated with enhancer-type chromatin. Both myogenic hypermethylation and hypomethylation often overlapped weak transcription-type chromatin and Polycomb-repressed-type chromatin. For representative genes, we illustrate relationships between DNA methylation, the local chromatin state, DNaseI hypersensitivity, and gene expression. For example, MARVELD2 exhibited myogenic hypermethylation in transcription-type chromatin that overlapped a silenced promoter in Mb and Mt while TEAD4 had myogenic hypomethylation in intronic subregions displaying enhancer-type or transcription-type chromatin in these cells. For LSP1, alternative promoter usage and active promoter-type chromatin were linked to highly specific myogenic or lymphogenic hypomethylated DMRs. Lastly, despite its myogenesis-associated expression, TBX15 had multiple hypermethylated myogenic DMRs framing its promoter region. This could help explain why TBX15 was previously reported to be underexpressed and, unexpectedly, its promoter undermethylated in placentas exhibiting vascular intrauterine growth restriction

    Average Viterbi Decoding Accuracy over 10 different trials (instances) of 10 k-length synthetic 3-level signal data, where all levels have identical Poisson duration but the separation (gaussian emission means) between the levels varies

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "A novel, fast, HMM-with-Duration implementation – for application with a new, pattern recognition informed, nanopore detector"</p><p>http://www.biomedcentral.com/1471-2105/8/S7/S19</p><p>BMC Bioinformatics 2007;8(Suppl 7):S19-S19.</p><p>Published online 1 Nov 2007</p><p>PMCID:PMC2099487.</p><p></p> The Viterbi decoding accuracy improves as the number of bins increases in the decoding HMM's approximation of the Poisson durations generated using a 1 k-bin length distribution representation in the generating HMM. From left to right in each plot, the Viterbi response improves as the separation of the 3 levels (emission means) increases. , decoding performance when all levels have identical attributes is random 3-way guessing, so the expected 3333 out of 10000 correct is observed in all cases. , decoding performance with distributions with means 19 (for geometric), 19.25 and 19.5 (with Poisson distributed dwell-times)

    , decoding performance with distributions with means 19 (for geometric), 20 and 21 (with Poisson distributed dwell-times)

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "A novel, fast, HMM-with-Duration implementation – for application with a new, pattern recognition informed, nanopore detector"</p><p>http://www.biomedcentral.com/1471-2105/8/S7/S19</p><p>BMC Bioinformatics 2007;8(Suppl 7):S19-S19.</p><p>Published online 1 Nov 2007</p><p>PMCID:PMC2099487.</p><p></p> , decoding performance with distributions with different mean separations, with a 1000-bin representation of the state dwell-time distribution. (See Fig. 11 caption for further details.
    corecore